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^-^ Abstract 

^^ One popular approach to soft-decision decoding of Reed-Solomon (RS) codes is based on using 

P i multiple trials of a simple RS decoding algorithm in combination with erasing or flipping a set of symbols 

c/) or bits in each trial. This paper presents a framework based on rate-distortion (RD) theory to analyze these 

'~~' multiple-decoding algorithms. By defining an appropriate distortion measure between an error pattern 

CO and an erasure pattern, the successful decoding condition, for a single errors -and-erasures decoding trial, 

^— I becomes equivalent to distortion being less than a fixed threshold. Finding the best set of erasure patterns 

■^ also turns into a covering problem which can be solved asymptotically by rate-distortion theory. Thus, the 



in 

o 
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proposed approach can be used to understand the asymptotic performance-versus-complexity trade-off 

in 

(T^ of multiple errors-and-erasures decoding of RS codes. 

This initial result is also extended a few directions. The rate-distortion exponent (RDE) is computed 

J> to give more precise results for moderate blocklengths. Multiple trials of algebraic soft-decision (ASD) 

S^ decoding are analyzed using this framework. Analytical and numerical computations of the RD and RDE 

C^ functions are also presented. Finally, simulation results show that sets of erasure patterns designed using 

the proposed methods outperform other algorithms with the same number of decoding trials. 
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I. Introduction 

Reed-Solomon (RS) codes are among the most popular error-correcting codes in communication and 
data storage systems. An {N, K) RS code of length A^ and dimension i^ is a maximum distance separable 
(MDS) lineal- code with minimum distance dmin = A^ — i^ + 1. RS codes have efficient hard-decision 
decoding (HDD) algorithms, such as the Berlekamp-Massey (BM) algorithm, which can correct up to 



L 



errors. 



2 

Since the discovery of RS codes |[ll, researchers have spent a considerable effort on improving the 
decoding performance at the expense of complexity. A breakthrough result of Guruswami and Sudan 
(GS) introduced an algebraic hard-decision list-decoding algorithm, based on bivariate interpolation and 
factorization, that can correct errors well beyond half the minimum distance of the code 13. Nevertheless, 
HDD algorithms do not fully exploit the information provided by the channel output. Koetter and Vardy 
(KV) later extended the GS decoder to an algebraic soft-decision (ASD) decoding algorithm by converting 
the probabilities observed at the channel output into algebraic interpolation conditions in terms of a 
multiplicity matrix [31. 

The GS and KV algorithms, however, have significant computational complexity. Therefore, multiple 
runs of errors-and-erasures and errors-only decoding with some low-complexity algorithm, such as the 
BM algorithm, has renewed the interest of researchers. These algorithms use the soft-information available 
at the channel output to construct a set of either erasure patterns H, O, test patterns ||6l, or patterns 
combining both jTl, HI and then attempt to decode using each pattern. Techniques have also been 
introduced to lower the complexity per decoding trial in [Ol, lITOll . lITTll . Other soft-decision decoding 
algorithms for RS codes include lfT2l . |[T3l that use the binary expansion of RS codes to work on the 
bit-level. In ifTll . belief propagation is run while the parity-check matrix is iteratively adapted on the 
least reliable basis. Meanwhile, ITTBI adapts the generator matrix on the most reliable basis and uses 
reprocessing techniques based on ordered statistics. 

In the scope of multiple errors-and-erasures decoding, there have been several algorithms proposed 
that use different erasure codebooks (i.e., different sets of erasure patterns). After running the errors- 
and-erasures decoding algorithm multiple times, each time using one erasure pattern in the set, these 
algorithms produce a list of candidate codewords, whose size is usually small, and then pick the best 
codeword on this list. The common idea of constructing the set of erasure patterns in these multiple errors- 
and-erasures decoding algorithms is to erase some of the least reliable symbols since those symbols are 
more prone to be erroneous. The first algorithm of this type is called Generalized Minimum Distance 
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(GMD) lU and it repeats errors-and-erasures decoding while successively erasing an even number of the 
least reliable positions (LRPs) (assuming that dmin is odd). More recent work by Lee and Kumar [5| 
proposes a soft-information successive (multiple) error-and-erasure decoding (SED) that achieves better 
performance but also increases the number of decoding attempts. Literally, the Lee-Kumar's SED(Z, /) 
algorithm runs multiple errors-and-erasures decoding trials with every combination of an even number 
< / of erasures within the I LRPs. 

A natural question that arises is how to construct the "best" set of erasure patterns for multiple errors- 
and-erasures decoding. Inspired by this, we first develop a rate-distortion (RD) framework to analyze the 
asymptotic trade-off between performance and complexity of multiple errors-and-erasures decoding of 
RS codes. The main idea is to choose an appropriate distortion measure so that the decoding is successful 
if and only if the distortion between the error pattern and erasure pattern is smaller than a fixed threshold. 
After that, a set of erasure patterns is generated randomly (similar to a random codebook generation) in 
order to minimize the expected minimum distortion. 

One of the drawbacks in the RD approach is that the mathematical framework is only valid as the block- 
length goes to infinity. Therefore, we also consider the natural extension to a rate-distortion exponent 
(RDE) approach that studies the behavior of the probability, pe, that the transmitted codeword is not on the 
list as a function of the block-length A^. The overall error probability can be approximated by pe because 
the probability that the transmitted codeword is on the list but not chosen is very small compared to pe. 
Hence, our RDE approach essentially focuses on maximizing the exponent at which the error probability 
decays as A^ goes to infinity. The RDE approach can also be considered as the generalization of the RD 
approach since the latter is a special case of the former when the rate-distortion exponent tends to zero. 
Using the RDE analysis, this approach also helps answer the following two questions: (i) What is the 
minimum error probability achievable for a given number of decoding attempts (or a given size of the 
set of erasure patterns)? (ii) What is the minimum number of decoding attempts required to achieve a 
certain error probability? 

The RD and RDE approaches are also extended beyond conventional errors-and-erasures decoding to 
analyze multiple-decoding for decoding schemes such as ASD decoding. It is interesting to note that the 
RDE approach for ASD decoding schemes contains the special case where the codebook has exactly one 
entry (i.e., ASD decoding is run only once). In this case, the distribution of the codebook that maximizes 
the exponent implicitly generates the optimal multiplicity matrix. This is similar to the line of work 
lO, EH, lfT6l . lITTll where various researchers solve for a multiplicity matrix that minimizes the error 
probability obtained by either using a Gaussian approximation [14J, applying a Chernoff bound |15|, 
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|[T6l . or using Sanov's theorem IITtI . 

Finally, we propose a family of multiple-decoding algorithms based on these two approaches that 
achieve better performance-versus-complexity trade-off than other algorithms. 

A. Outline of the paper 

The paper is organized as follows. In Section |ll} we design an appropriate distortion measure and 
present a rate-distortion framework, for both the RD and RDE approaches, to analyze the performance- 
versus-complexity trade-off of multiple errors-and-erasures decoding of RS codes. Also in this section, 
we propose a general multiple-decoding algorithm that can be applied to errors-and-erasures decoding. 
Then, in Section |lll| we discuss numerical computations of RD and RDE functions together with their 
complexity analyses which are needed for the proposed algorithm. In Section [IVj we analyze both bit- level 
and symbol-level ASD decoding and design distortion measures compatible with the general algorithm. 
A closed-form analysis of some RD and RDE functions is presented in Section |V] Next, in Section VI 



we offer some extensions that combine covering codes with random codes and also consider the case of 



a single decoding attempt. Simulation results are presented in Section VII and, finally, conclusions are 



provided in Section VIII 



II. A RD Framework For Multiple Errors-and-Erasures Decoding 

In this section, we first set up a rate-distortion framework to analyze multiple attempts of conventional 
hard decision errors-and-erasures decoding. 

Let Fm with m = 2*^ be the Galois field with m elements denoted as ai, 02, • • • , Om- We consider 
an {N, K) RS code of length N, dimension K over F^. Assume that we transmit a codeword c = 
(ci, C2, . . . jCat) G F^ over some channel and receive a vector r = (ri,r2, . . . ,rAr) G y^ where y 
is the received alphabet for a single RS symbol. While our approach can be applied to much more 
general channels, our simulations focus on the Additive White Gaussian Noise (AWGN) channel and 
two common modulation formats, namely BPSK and m-QAM. Correspondingly, we use 3^ = M*? for 
BPSK and 3^ = R^ for m-QAM. For each codeword index i, let i^i : {1, 2, . . . , m} — ;■ {1, 2, . . . , m} be 
the permutation given by sorting ttjj- = Pr(cj = oij\ri) in decreasing order so that vrj^^(i) > vrj(^^(2) ^ 
• • • > '^i,ipAm)- Then, we can specify yjj = Ci^jj\ as the j-th most reliable symbol for j = 1, . . . ,m 
at codeword index i. To obtain the reliability of the codeword positions (indices), we construct the 
permutation a : {1,2,..., N} — )• {1,2,..., N} given by sorting the probabilities ttj ^,(1) of the most 
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where nij = [^]j,i- 



likely symbols in increasing orderPl Thus, codeword position a{i) is the i-th LRP. These above notations 

will be used throughout this paper. 

Example 1: Consider N = 3 and m = 4. Assume that we have the probability ttj j written in a matrix 

form as follows: 

^ 0.01 0.01 0.93 ^ 

0.94 0.03 0.04 

0.03 0.49 0.01 

Y 0.02 0.47 0.02 J 

then v^i(l,2,3,4) = (2,3,4,1), (^2(1,2,3,4) = (3,4,2,1), 993(1,2,3,4) = (1,2,4,3) and a(l,2,3) = 

(2,3,1). 

Condition 1: (Classical decoding threshold, see ifTSl . ||T9l ): If e symbols are erased, a conventional 

hard-decision errors-and-erasures decoder such as the BM algorithm is able to correct v errors in unerased 

positions if and only if 

2L' + e<N-K + l. (1) 

A. Conventional error patterns and erasure patterns. 

Definition 1: (Conventional error patterns and erasure patterns) We define x'^ G Z^ = {0, 1}^ and 
x^ G Z^ as an error pattern and an erasure pattern respectively, where Xj = means that an error occurs 
(i.e., the most likely symbol is incorrect) and Xj = means that the symbol at index i is erased (i.e., an 
erasure is appUed at index i). X^ and X^ will be used to denote the random vectors which generate 
the realizations x^ and x^ , respectively. 

Example 2: If dmin is odd then the GMD algorithm corresponds to the set 

{mill . . . , 001111 . . . , 000011 ...,..., 00 ... 11 ... 1} 

of erasure patterns. Meanwhile, the SED(3,2) uses the following set 

illllll . . . ,001111 • • . ,010111 . . . ,100111 . . .}. 

Here, in each erasure pattern, the letters are written in increasing reliability order of the codeword 
positions. 

Let us revisit the question of how to construct the best set of erasure patterns for multiple errors- 
and-erasures decoding. First, it can be seen that a multiple errors-and-erasures decoding succeeds if the 



'other measures such as entropy or the average number of guesses might improve Algorithm B in Section 



II-C 



November 9, 2010 DRAFT 



condition ([T]) is satisfied during at least one round of decoding. Thus, our approach is to design a distortion 
measure that converts the condition ([I]) into a form where the distortion between an error pattern x^ and 
an erasure pattern x^ , denoted as d{x^,x'^), is less than a fixed threshold. 

Definition 2: Given a letter-by -letter distortion measure 6, the distortion between an error pattern x^ 
and an erasure pattern x^ is defined by 



N 
i=l 

Proposition 1: If we choose the letter-by-letter distortion measure 6 : X x X ^ M>o, where in this 



d(x^,x^)=5^5(x„; 



case X = X = Z2, as follows: 

5(0,0) = 1, <5(0,1) = 2, ^^^ 

5(1,0) = 1, 5(1,1) = 0, 
then the condition ([T]) for a successful errors-and-erasures decoding is equivalent to 

d{x^,x^)<N -K + 1 (3) 

where the distortion is less than a fixed threshold. 
Proof: First, we define 

Xj,k - \{i G {1,2,...,7V} : Xi = j,Xi = k}\ 

to count the number of (xj,Xj) pairs equal to {j,k) for every j ^ X and k ^ X. With the chosen 
distortion measure, we have 

d{x^,x^) = 2x0,1 + Xofi + Xifi- 

Noticing that e = Xo,o + Xi,o and u = xo,i> the condition (fTI) for one errors-and-erasures decoding attempt 

to succeed becomes 2xo,i + Xo,o + Xi,o < N — K -\- 1 which is equivalent to d{x^ , x^) < N — K -\- 1. 

m 

Next, we try to maximize the chance that this successful decoding condition is satisfied by at least one of 
the decoding attempts (i.e., d{x^ , x^) < A^ — iC + 1 for at least one erasure pattern x^). Mathematically, 
we want to build a set B of no more than 2^ erasure patterns x^ that achieves the maximum 

max VilmmdiX^ ,x^) <N -K + l\ . 

i3:|B|<2« l^:r«eB J 

Solving this problem exactly is very difficult. However, one can observe that it is a covering problem 
where tries to cover the most-likely error patterns using a fixed number of spheres centered at the chosen 
erasure patterns. This view leads to two asymptotic solutions of the problem based on rate-distortion 
theory. Taking this point of view, we view the error pattern x^ as a source sequence and the erasure 
pattern x^ as a reproduction sequence. 
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Error pattern □ 
Erasure pattern • 



Fig. 1. Pictorial illustration of a covering problem 

1) RD approach: Rate-distortion theory (see |'20', Chapter 13]) characterizes the trade-off between R 
and D such that sets B of 2^^ reproduction sequences exist (and can be generated randomly) so that 



7V->oo N 



^N ^N\ 



mill d(X'^ , r 



<D. 



Under mild conditions, this implies that, for large enough N, we have 

min d(X'^,x^) < ND 

with high probability. Here, R and D are closely related to the complexity and the performance, respec- 
tively, of the decoding algorithm. Therefore, we characterize the trade-off between those two aspects 
using the relationship between R and D. In this paper, we denote the rate and distortion by R and D, 
respectively, using unnormalized quantities, i.e., R = NR and D = ND. 

2) RDE approach: The above-mentioned RD approach focuses on minimizing the average minimum 
distortion with little knowledge of how the tail of the distribution behaves. In this RDE approach, we 
instead focus on directly minimizing the probability that the minimum distortion is not less than the 
predetermined threshold D = N — K + 1 (due to the condition Q) with the help of an error-exponent 
analysis. The exact probability of interest is 



Pe 



Pr X^" : mind(X'\x'^) > D 



that reflects how likely the decoding threshold ([T]) is going to fail. In other words, every error pattern 
x^ can be covered by a sphere centered at an erasure pattern f^ except for a set of error patterns of 
probability p^. The RDE analysis shows that p^ decays exponentially as A^ — )• oo and the maximum 
exponent attainable is the RDE function F{R,D). Throughout this paper, we denote the rate-distortion 
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exponent by F{R, D) using unnormalized quantities (i.e., without dividing by A^) and note that exponent 
used by other authors in 11211 . Ell . ||23]| is often the normalized version F{R,D) = ^ j^ ' . 

RDE analysis is discussed extensively in GTI . Il22l and it is shown that a set B of roughly 2^^ 
codewords, generated randomly using the test-channel input distribution, can be used to achieve F{R, D). 
An upper bound is also given that shows, for any e > 0, there is a sufficiently large A^ (see |24, p. 229]) 
such that 

^^<2-^[F(R,D)-e]. 



An exponentially tight lower bound for pe can also be obtained (see 11241 p. 236]) and it implies that the 
best sequence of codebooks satisfy 

lim -—\ogpe = F{R,D). 

N^oo iV 

Remark 1: The RDE approach possesses several advantages. First, the converse of the RDE |[24l p. 
236] provides a lower bound for p^. This implies that, given an arbitrary set B of roughly 2^^ erasure 
patterns and any e > 0, the probability p^ cannot be made lower than 2^^[^(^'^)+'^] for N large enough. 
Thus, no matter how one chooses the set B of erasure patterns, the difference between the induced 
probability of error and the pe for the RDE approach becomes negligible for N large enough. Second, 
it can help one estimate the smallest number of decoding attempts to get to a RDE of F (or get to an 
error probability of roughly 2^^^) or, similarly, allow one to estimate the RDE (and error probability) 
for a fixed number of decoding attempts. 

B. Generalized error patterns and erasure patterns 

In this subsection, we consider a generalization of the conventional error patterns and erasure patterns 
under the same framework to make better use of the soft information. At each index of the RS codeword, 
besides erasing a symbol, we also try to decode using not only the most likely symbol but also less likely 
ones as the hard decision (HD) symbol. To handle up to the I most likely symbols at each index i, we 
let ^£+1 = {0, 1, . . . ,^} and consider the following definition. 

Definition 3: (Generalized error patterns and erasure patterns) Consider a positive integer £ smaller 
than the field size m. Let x^ G "^e+i be a generalized error pattern where, at index i, xi = j implies 
that the j-th most likely symbol is correct for j G {1,2,... i}, and Xi = implies none of the first 
i most likely symbols is correct. Let x^ G '^f+i be a generalized erasure pattern used for decoding 
where, at index i, Xi = k implies that the k-th most likely symbol is used as the hard-decision symbol 
for fc G {1, 2, . . . , £}, and Xi = implies that an erasure is used at that index. 
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For simplicity, we refer to x^ as the error pattern and x^ as the erasure pattern like in the conventional 
case. Now, we need to convert the condition M to the form where d{x^ , x^) is less than a fixed threshold. 
Proposition [T] is thereby generalized into the following proposition. 

Proposition 2: We choose the letter-by-letter distortion measure 5 : X x X ^ M>o> where in this case 
X = X = Z^+i, defined by 5{x, x) = [A]^:,^ in terms of the (^ + 1) x (^ + 1) matrix 

/ 1 2 ... 2 2 \ 
1 ... 2 2 



A 



2 



V 



(4) 



1 2 . 

1 2 ... 2 y 

Using this, the condition ([T]) for a successful errors-and-erasures decoding is equivalent to 

d{x^,x^) <N -K + 1. 

Proof: The reasoning is very similar to the proof of Proposition 1 using the fact that e = X]i=o Xj,o 
and u = Y.i=i Y.'j=o,j^k Xj,k where Xj,k = |{i G {1, 2, . . . , iV} : x, = j, xt = k}\ for every j, k G Z^+i. 



( 1 


2 


2\ 


1 





2 


u 


2 


0/ 



For each £ = 1,2, ... ,m, we will refer to this generalized case as mBM-£ decoding. 

Example 3: Consider mBM-2 (or top-^ decoding with i = 2). In this case, the distortion measure is 
given by following the matrix 



A 



Remark 2: The distortion measure matrix changes slightly if we use the errors-only decoding instead 
of errors-and-erasures decoding. In this case, X = Z^+i \ {0} and the chosen letter-by-letter distortion 
measure is given in terms of the {£ + 1) x £ matrix obtained by deleting the first column of (J4]). When 
^ = 2, we consider the first and second most likely symbols as the two hard-decision symbols at each 
codeword position. This is similar to the Chase-type decoding method proposed by Bellorado and Kavcic 
||9|. Das and Vardy also suggest this approach by considering only several highest entries in each column 
of the reliability matrix IT for single ASD decoding of RS codes IITtII . 

C. Proposed General Multiple-Decoding Algorithm 

In this section, we propose two general multiple-decoding algorithms for RS codes. In each algorithm, 
one can choose either Step 2a that corresponds to the RD approach or Step 2b that corresponds to the 
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RDE approach. These general algorithms apply to not only multiple errors-and-erasures decoding but 
also multiple-decoding of other decoding schemes that we will discuss later. The common first step is 
designing a distortion measure 6 : X x X ^ ]R>o that converts the condition for a single decoding 
to succeed to the form where distortion is less than a fixed threshold. After that, decoding proceeds as 
described below. 

1) Algorithm A: 

Step 1: Based on the received signal sequence, compute an tti x A^ reliability matrix 11 where [H]j^i = 
TTjj. From this, determine the probability matrix P where pij = Pr(Aj = j) for z = 1, 2, . . . , A^ and 

jex. 

Step 2a: (RD approach) Compute the RD function of a source sequence (error pattern) with probability 



of source letters derived from P and the chosen distortion measure (see Section [III] and Section |V]). Given 
the design rate R, determine the optimal input-probability distribution matrix Q, for the test channel, 
with entries qi^k = Prl-^i = A;) for i = 1, 2, . . . , A^ and k ^ X. 

Step 2b: (RDE approach) Given D (in most cases D = N — K + I) and the design rate R, compute 
the RDE function of a source sequence (error pattern) with probability of source letters derived from P 



and the chosen distortion measure (see Section III and Section ml. Also determine the optimal input- 



probability distribution matrix Q, for the test channel, with entries qi^k = P^{Xi = k) ioi i = 1,2, . . . , N 
and k e X. 

Step 3: Randomly generate a set of 2^ erasure patterns using the test-channel input-probability distri- 
bution matrix Q. 

Step 4: Run multiple attempts of the corresponding decoding scheme (e.g., errors-and erasures decod- 
ing) using the set of erasure patterns in Step 3 to produce a list of candidate codewords. 

Step 5: Use the maximum-likelihood (ML) rule to pick the best codeword on the list. 

Remark 3: In Algorithm A, the RD (or RDE) function is computed on the fly, i.e., after every received 
signal sequence. In practice, it may be preferable to precompute the RD (or RDE) function based on 
the empirical distribution measured from the channel. We refer to this approach as Algorithm B, and 
simulation results show a negligible difference in the performance of these two algorithms. 

2) Algorithm B: 

Step 1: Transmit r (e.g., r = 10^ — 10^) arbitrary test RS codewords, indexed by time t = 1, 2, . . . , r, 
over the channel and compute a set of r m x A^ matrices 11^ where [11^ ]j j = vr ,j) is the probability 
of the j-th most likely symbol at position i during time t. For each time t, obtain the matrix Ilg from 
IIj^ through a permutation a^*'^ : {1, 2, ... , A^} — ;> {1, 2, . . . , A^} that sorts the probabilities vr^ ^t, in 
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increasing order to indicate the reliability order of codeword positions. Take the entry-wise average of 
all r matrices Ilg to get an average matrix nJj The matrix fl serves as 11 in Algorithm A and from 
this, determine the probability matrix P where pij = Pr(Xj = j) for i = 1, 2, . . . , A^ and j € X. 

Step 2a: (RD approach) Compute the RD function of a source sequence (error pattern) with probability 
of source letters derived from P and the chosen distortion measure. Given a design rate R, determine 
the test-channel input-probability distribution matrix Q where qi^k = Pi'(Xj = k) for i = 1,2,. . .,N 
and k e X. 

Step 2b: (RDE approach) Given D (in most cases D = N — K + I) and the design rate R, compute 
the RDE function of a source sequence (error pattern) with probability of source letters derived from P 
and the chosen distortion measure. Also determine the optimal test-channel input-probability distribution 
matrix Q where qi^k = P^i^i = k) ioT i = 1,2, . . . , N and k ^ X. 

Step 3: Based on the actual received signal sequence, compute 7rj(p^(i) and determine the permuta- 
tion a that gives the reliability order of codeword positions by sorting vtj^ (i) in increasing order. 

Step 4: Randomly generate a set of 2^ erasure patterns using the test-channel input-probability distri- 
bution matrix Q and permute the indices of each erasure pattern by the permutation a^^ . 

Step 5: Run multiple attempts of the corresponding decoding scheme (e.g., errors-and-erasures decod- 
ing) using the set of erasure patterns in Step 4 to produce a list of candidate codewords. 

Step 6: Use the ML rule to pick the best codeword on the list. 

III. Computing The RD and RDE Functions 

In this section, we will discuss some numerical methods to compute the RD and RDE functions and the 
corresponding test-channel input-probability distribution matrix Q, whose entries are Qik = Pr(A'j = k) 
for i = 1, 2, . . . , A^ and A; G Af. These numerical methods allow us to efficiently compute the RD and 
RDE functions discussed in the previous section for arbitrary discrete distortion measures. For some 
simple distortion measures, closed-form solutions are given in Section [V] 

A. Computing the RD function 

For an arbitrary discrete distortion measure, it can be difficult to compute the RD function analyti- 
cally. Fortunately, for a single source X, the Blahut algorithm (see details in ||25]| ) gives an alternating 



9 (t) ^ 

In fact, one need not store separately each IIj matrix. The average D can be computed on the fly. 
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minimization technique that efficiently computes the RD function which is given b>|j 

R{D)= mill V VpjWfcijlog^^ 

where pj ^ Pr(X = j)> % = Pr(X = fc), t^^i^- ^ Pr(X = A:|X = j), an^^ 



Wd = < w 



Wk\j > 0, Efc «^fc|j = 1 

^j^kPjWkljSjk <D 

More precisely, given the Lagrange multiplier t < that represents the slope of the RD curve at a specific 
point (see ll26l Thm 2.5.1]) and an arbitrary all-positive initial test-channel input-probability distribution 
vector g*-°\ the Blahut algorithm shows us how to compute the rate-distortion pair {Rt,Dt). 

However, it is not straightforward to apply the Blahut algorithm to compute the RD for a discrete 
source sequence x^ (an error pattern in our context) of A^ independent but not necessarily identical 
(i.n.d.) source components Xj. In order to do that, we consider the group of source letters (j'l, j2, • • • ,Jn) 
where ji E A' as a super-source letter J' E X^ , the group of reproduction letters {ki,k2, ■ ■ ■ , kj\f) where 
ki £ X as a. super-reproduction letter /C E Af^, and the source sequence x^ as a single source. For 
each super-source letter J, pj = Pr(X^ = J) = Y[i=i^^i-^i — Ji) — Y\i=iPji follows from the 
independence of source componentsjj 

While we could apply the Blahut algorithm to this source directly, the complexity is a problem because 
the alphabet sizes for J and /C become the super-alphabet sizes |x|^ and |x|^ respectively. Instead, 
we avoid this computational challenge by choosing the initial test-channel input-probability distribution 
so that it can be factored into a product of N initial test-channel input-probability components, i.e., 
IjC ~ Y\i=i % ■ Oii^ ^^^ verify that this factorization rule still applies after every step r of the iterative 
process, i.e., q£ = ni=i % ■ Therefore, the convergence of the Blahut algorithm |[27l implies that the 
optimal distribution is a product distribution, i.e., q^ = ni=i 9fc • 

One can also finds that, for each parameter t, one only needs to compute the rate-distortion pair for each 
source component xi separately and sum them together. This is captured into the following algorithm. 

Algorithm 1: (Factored Blahut algorithm for RD function) Consider a discrete source sequence x^ of 
N i.n.d. source components Xj's with probability pj^ = Pr(Xi = ji). Given a parameter t < 0, the rate 

^AU logarithms in this paper are taken to base 2. 

*S{j,k) is sometimes written as Sj^ for convenience. 

^In this paper, the notations pj. and p^j are interchangeable. The notations q^. and qi^k are also interchangeable. 
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and the distortion for this source sequence under a specified distortion measure are given by 

N N 

Rt = Yl ^M and Dt = Y^ Di,t (5) 

i=l i=l 

where the components i?j < and Di^t are computed by the Blahut algorithm with the Lagrange multiplier t. 
This rate-distortion pair can be achieved by the corresponding test-channel input-probability distribution 
q]C — Vi{X^ = /C) = nr=i 9fe. where the component probability distribution q^^ = Pr(Xj = ki). 

Remark 4: Equation (|5]) can also be derived from [|26l Corollary 2.8.3] in a way that does not use the 
convergence property of the Blahut algorithm. 

B. Computing the RDE function 

The original RDE function F{R,D), defined in fTU. Sec. VI] for a single source X, is given by 

F(R, D) = max niin > Pi log — (6) 

3 

where pj = Pr(X = j), q^ = Pr(X = k), wuj = Pr(X = k\X = j), and 

For a single source X, given two parameters s > and t < which are the Lagrange multipliers 
introduced in the optimization problem (see lIlTl p. 415]), the Arimoto algorithm given in ll28l Sec. V] 
can be used to compute the exponent, rate, and distortion numerically. 

In the context we consider, the source (error pattern) x^ comprises i.n.d. source components Xj's. 
We follow the same method as in the RD function case, i.e., by choosing the initial distribution still 
arbitrarily but following a factorization rule q\^ = Y\i=i Qk- ' and this gives the following algorithm. 

Algorithm 2: (Factored Arimoto algorithm for RDE function) Consider a discrete source x^ of i.n.d. 
source components Xj's with probability pj. = Pr(Xj = ji). Given Lagrange multipliers s > and t < 0, 
the exponent, rate and distortion under a specified distortion measure are given by 

N N N 

i=l j=l i=l 

where the components F, | ^ ^ , i?j | ^ j , D, | ^ j are computed parametrically by the Arimoto algorithm. 

Remark 5: Though it is standard practice to compute error-exponents using the implicit form given 
above, this approach may provide points that, while achievable, are strictly below the true RDE curve. The 
problem is that the true RDE curve may have a slope discontinuity that forces the implicit representation 
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to have extra points. An example of this behavior for the channel coding error exponent is given by 
Gallager l29l p. 147]. For the i.n.d. source considered above, a cautious person could solve the problem 
as described and then check that the component RDE functions are differentiable at the optimum point. 
In this work, we largely neglect this subtlety. 

C. Complexity of computing RD/RDE functions 

1) Complexity of computing RD function.: For each parameter t < 0, if we directly apply of the 
original Blahut algorithm to compute the {Rt,Dt) pair, the complexity is O(Tmaxl'^l^l'^l^) where 
7"max is the number of iterations in the Blahut algorithm. However, using the factored Blahut algorithm 
(Algorithm hi) greatly reduces this complexity to 0(rmax|'V||<^|-^^)- In Section II-C one of the proposed 



algorithms needs to compute the RD function for a design rate R. To do this, we apply the bisection 
method on t to find the correct t that corresponds to the chosen rate R. 

• Step 0: Set tmin < (e.g., tmin = -10) 

• Step 1: If Rt^-^ > R, go to Step 3. Else go to Step 2. 

• Step 2: If i?t,„i„ = R then stop. Else if Rt^-^ < R, set tmin <— 2t^in and go to Step 1. 

• Step 3: Find t using the bisection method to get the correct rate R within e/j. 
The overall complexity of computing the RD function for a design rate R is 

Now, we consider the dependence of t^^x on e/j. It follows from ll27l that the error due to early 
termination of the Blahut algorithm is O ( ;^r^ ) . This implies that choosing Tmax = ^ ( ^ ) is sufficient. 
However, recent work has shown that a slight modification of the Blahut algorithm can drastically increase 
the convergence rate fBOl]. For this reason, we leave the number of iterations as the separate constant 
7"max and do not consider its relationship to the error tolerance. 

2) Complexity of computing RDE function.: Similarly, for each pair of parameters t < and s > 0, 
the complexity if we directly apply of the original Arimoto algorithm to compute the (i?|s,t, -D|s,t) pair 
is O(Tmaxl'^l^l'^l^) whcrc Tmax IS the number of iterations. Instead, if the factored Arimoto algorithm 
(Algorithm b| is employed, this complexity can also be reduced to 0(Tmax|'^||'^|-^)- In one of our 
proposed general algorithms in Section II-C[ we need to compute the RDE function for a pre-determined 



{R, D) pair. We use a nested bisection technique to find the Lagrange multipliers s, t that give the correct 
R and D. 

• Step 0: Set tmin < and Smax > (e.g., tmin = -10 and Smax = 2) 
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• Step 1: If ^U„„^,i,„i„ < R, set tmin ^ 2tmin and repeat Step 1. Else go to Step 2. 

• Step 2: Find t using the bisection method to obtain R\s^^„.t = R within e^. If i^|s,„,„,t > ^i go to 
Step 3. If -DU„,„,t = ^ then stop. Else if -D|s„„.t < D, set Smax •^ 2smax and go to Step 1. 

• Step 3: Find s using the bisection method to get the correct distortion D within eo while with each 
s doing the following steps 

- Step 3a: If -R|s,t,„i„ > R, go to Step 3c. 

- Step 3b: If i2|s,t„i„ = R^ then stop. Else if R\s,t^-^ < R, set t^ain ^ 2tmin and go to Step 1. 

- Step 3c: Find t using the bisection method to get the correct R within e/j. 
The overall complexity of computing the RD function for a design rate R is therefore 

O (r^axlog^ (^) log2 (^) l^ll^liv) . 

IV. Multiple Algebraic Soft-Decision (ASD) Decoding 

In this section, we analyze and design a distortion measure to convert the condition for successful 
ASD decoding to a suitable form so that we can apply the general multiple-decoding algorithm to ASD 
decoding. 

First, let us give a brief review on ASD decoding of RS codes. Let {/3i, /32, • • • ,/3Ar} be a set of 
N distinct elements in F^- From each message polynomial f{X) = /o + fiX + . . . + Jk-iX^^^ 
whose coefficients are in F^, we can obtain a codeword c = (ci, C2, . . . , cat) by evaluating the message 
polynomial at {/3j},^^, i.e., Cj = /(ft) for i = 1, 2, . . . , A^. Given a received vector r = (ri, r2, . . . , rjq), 
we can compute the a posteriori probability (APP) matrix 11 as follows: 

\n\j^i = TTij = Pr(cj = Ujln) for l<i < N,l <j < m. 

The ASD decoding as in |l3l| has the following main steps. 

1) Multiplicity Assignment: Use a particular multiplicity assignment scheme (MAS) to derive an mxN 
multiplicity matrix, denoted as M, of non-negative integer entries {Mij} from the APP matrix 11. 

2) Interpolation: Construct a bivariate polynomial Q{X, Y) of minimum {1, K — 1) weighted degree 
that passes through each of the point {Pj,ai) with multiplicity Mij for i = l,2,...,m and 
j = l,2,...,N. 

3) Factorization: Find all polynomials f{X) of degree less than K such that Y — f{X) is a factor of 
Q{X, Y) and re-evaluate these polynomials to form a list of candidate codewords. 

In this paper, we denote fj, = maxj j Mij as the maximum multiplicity. Intuitively, higher multiplicity 
should be put on more likely symbols. A higher fi generally allows ASD decoding to achieve a better 
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performance. However, one of the drawbacks of ASD decoding is that its decoding complexity is roughly 
0{N^fi'^) |31 1. Even though there have been several reduced complexity variations and fast architectures 
as discussed in |[32l . ||33]| . |[34ll . the decoding complexity still increases rapidly with fi. Thus, in this 
section we will mainly work with small fi to keep the complexity affordable. 

One of the main contributions of [3| is to offer a condition for successful ASD decoding represented 
in terms of two quantities specified as the score and the cost as follows. 

Definition 4: The score 5m (c) with respect to a codeword c and a multiplicity matrix M is defined 
as 

N 

i=i 
where [cj] = i such that aj = Cj. The cost Cm of a multiplicity matrix M is defined as 

m N 

1=1 j=l 

Condition 2: (ASD decoding threshold, see IS, ll35]| . (HI). The transmitted codeword will be on the 
Ust if 



(a + 1) 
for some a G N such that 



Sm-^{K-1) 



> Cm (8) 



aiK-l)<SM<ia + l)iK-l). (9) 

To match the general framework, the ASD decoding threshold (or condition for successful ASD 
decoding) should be converted to the form where the distortion is smaller than a fixed threshold. 

A. Bit-level ASD case 

In this subsection, we consider multiple trials of ASD decoding using bit-level erasure patterns. A 
bit-level error pattern 6" G Z2 and a bit-level erasure pattern 6" G Zg have length n = N x -q since 
each symbol has -q bits. Similar to Definition [1] of a conventional error pattern and a conventional erasure 
pattern, 6j = in a bit-level error pattern implies a bit-level error occurs and hi in a bit-level erasure 
pattern implies that a bit-level erasure is applied. We also use B^ and B^ to denote the random vectors 
which generate the realizations b^ and h^ , respectively. 

From each bit-level erasure pattern, we can specify entries of the multiplicity matrix M using the bit- 
level MAS proposed in |[35]| as follows: for each codeword position, assign multiplicity 2 to the symbol 
with no bit erased, assign multiplicity 1 to each of the two candidate symbols if there is 1 bit erased, 
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and assign multiplicity zero to all the symbols if there are > 2 bits erased. All the other entries are zeros 
by default. This MAS has a larger decoding region compared to the conventional errors-and-erasures 
decoding scheme. 

Condition 3: (Bit-level ASD decoding threshold, see [35]) For RS codes of rate ^ > | + ;^, ASD 
decoding using the bit-level MAS will succeed (i.e., the transmitted codeword is on the list) if 

3ub + eb<^{N-K + l) (10) 

where e^ is the number of bit-level erasures and Uh is the number of bit-level errors in unerased locations. 
We can choose an appropriate distortion measure according to the following proposition which is a natural 
extension of Proposition [T] in the symbol level. 

Proposition 3: If we choose the bit-level letter-by-letter distortion measure 5 : Z2 x Z2 — )■ M>o as 

follows 

5(0,0) = 1, 5(0,1) = 3, 

5(1,0) = 1, 5(1,1) = 0, 



then the condition ( 10 1 becomes 



d(6",r)<^(iV-i^ + l). (11) 



Proof: The condition ( 10 1 can be seen to be equivalent to 

^d(6",6") <N-K + 1 

using the same reasoning as in Proposition [T] The results then follows right away. ■ 

Remark 6: We refer the multiple-decoding of bit-level ASD as m-bASD. 

B. Symbol-level ASD case 

In this subsection, we try to convert the condition for successful ASD decoding in general to the form 
that suits our goal. We will also determine which multiplicity assignment schemes allow us to do so. 

Definition 5: (Multiplicity type) Consider a positive integer £ < m where m is the number of elements 
in F™. For some codeword position, let us assign multiplicity rrij to the j-th most likely symbol 
for j = 1,2,..., I. The remaining entries in the column are zeros by default. We call the sequence, 
(m,i,m,2, . . . ,1711), the column multiplicity type for "top-£" decoding. 

First, we notice that a choice of multiplicity types in ASD decoding at each codeword position has 
the similar meaning to a choice of erasure decisions in the conventional errors-and-erasures decoding. 
However, in ASD decoding we are more flexible and may have more types of erasures. For example, 
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assigning multiplicity zero to all the symbols (all-zero multiplicity type) at codeword position i is similar 
to erasing that position. Assigning the maximum multiplicity fi to one symbol corresponds to the case 
when we choose that symbol as the hard-decision one. Hence, with some abuse of terminology, we also 
use the term (generalized) erasure pattern x^ for the multiplicity assignment scheme in the ASD context. 
Each erasure-letter Xi gives the multiplicity type for the corresponding column of the multiplicity matrix 
M. 

Definition 6: (Error patterns and erasure patterns for ASD decoding) Consider a MAS with T mul- 
tiplicity types. Let x^ G {1,2...,T}^ be an erasure pattern where, at index i, Xi = j implies that 
multiplicity type j is used at column i of the multiplicity matrix M. Notice that the definition of an error 
pattern x^ E '^f+i ^^ Definition p^ applies unchanged here. 

Remark 7: In our method, we generally choose an appropriate integer a in Condition [2] and design 
a distortion measure corresponding to the chosen a so that the condition for successful ASD decoding 
can be converted to the form where distortion is less than a fixed threshold. The following definition of 
allowable multiplicity types will lead us to the result of Lemma [1] and consequently, a > /i, as stated 
in Corollary [T] Also, we want to find as many as possible multiplicity types since rate-distortion theory 
gives us the intuition that in general the more multiplicity types (erasure choices) we have, the better 
performance of multiple ASD decoding we achieve as A^ becomes large. 

Definition 7: The set of allowable multiplicity types for "top-^" decoding with maximum multiplicity 
/i is defined to bqj 

Ej=i "^i(/" - ^j) <(/" + !) (|{j -rrij 7^0}\- 1) mmj.,m,j^o ruj 



A{ti, 



A 



E' 

[mi, 1712,. ■■,me^ 



(12) 

We take the elements of this set in an arbitrary order and label them as 1,2, . . . ,\A{fi,£)\ with the 
convention that the multiplicity type 1 is always (/i, 0, . . . , 0) which assigns the whole multiplicity /x to 
the most likely symbol. The multiplicity type k is denoted as (mi fc, ?TT-2,fc, • • • rrii^k)- 

Remark 8: Multiplicity types (0, 0, ... , 0), (1, 1 . . . , 1) as well as any permutations of (/i, 0, . . . , 0) and 
(L2 J ' L2-1 1 0' • • • ' 0) ^^ always in the allowable set A{^, fi). We use mASD-/i to denote the proposed 
multiple ASD decoding using A{fi, jjl). 

Example 4: Consider mASD-2 where ^ = ^ = 2. We have A{2, 2) = {(2, 0), (1, 1), (0, 2), (0, 0)} which 
comprises four allowable multiplicity types for "top-2" decoding as follows: the first is (2, 0) where we 



6 



We use the convention that vainj.m-^o rrij = if {j : rrij 7^ 0} = 
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assign multiplicity 2 to the most likely symbol yj i, the second is (1,1) where we assign equal multiplicity 

1 to the first and second most likely symbols y^ i and yj 2> the third is (0, 2) where we assign multiplicity 

2 to the second most likely symbol y^ 2> and the fourth is (0, 0) where we assign multiplicity zero to all 
the symbols at index i (i.e., the i-th column of M is an all-zero column). We also consider a restricted 
set, called mASD-2a, that uses the set of multiplicity types {(2, 0), (1, 1), (0, 0)}. 

Example 5: Consider mASD-3. In this case, the allowable set ^(3, 3) consists of all the permutations 
of (3, 0, 0), (0, 0, 0), (1, 1, 0), (2, 1, 0), (1, 1, 1). We can see that the set ^(3, 2) consists of all permutations 
of (3, 0), (2, 1), (1, 1), (0, 0) and |^(3, 2)| < |^(3, 3)|. 

From now on, we assume that only allowable multiplicity types are considered throughout most of the 
paper. With that setting in mind, we can obtain the following lemmas and theorems. 

Lemma 1: Consider a MAS(;U, i) for "top-£" ASD decoding with multiplicity matrix M that only 
uses multiplicity types in the allowable set A{fi,£). Then, the score and the cost satisfy the following 
inequality: 

2Cm > (;^ + 1)Sm- 

Proof: Let us denote e^ = \{i G {1,...,A^} : Xi = k}\ to count the number of positions i 
that use multiplicity type k for k = 1,...,T and notice that X]a;=i ^^^ ~ ^- ^^ ^^^^ ^^^ ^j,k = 
\{i G {1,...,A^} : Xi / j,Xi = k}\ to count the number of positions i that use multiplicity type k 
where the j-th most reliable symbol yij is incorrect for j = 0, . . . , £ and k = 1, . . . , T. The notation 
Xj,k = |{^ G {1; • • • ) N} ■ Xi = J, Xi = k}\ remains the same. Notice also that 

Cfc = ^ Xj,fc and Xj,fe = Cfc - Vj^k- (13) 

i=o 

The score and the cost can therefore be written as 

N 



J 




(14) 



kXj,k (15) 



T e 
+ X] X] ^j,k{ek - i'j,k) (16) 

k=2 j=l 
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and 



m N 



T £ 

fc=i i=i 

(T \ T € 

fe=2 / fe=2 1 = 1 



rrij^kimj^k + 1) 



(17) 



where (15l and (17i use the fact that the multipUcity type 1 is always assumed to be (/i, 0, 

Hence, we obtain 

T e T e 

2Cm - (m + 1)S'm = ^(Ai + l)i^i,i + y^(/u + l)^mj,fc^'j,fc - ^efc^mj,fc(^ 

A:=2 j=l fc=2 j=l 

and therefore, since fi and i/i^i are non-negative. Lemma [T] holds if we can show 



,0). 



"ij,fc), 



(/^ + 1) X] '^i'f'^i'k > efc ^ m-j- fc(/x - m^- fc) 



for every k = 2, . . .T. 
Next, we observe that 



and 



— \ — I i'-rrii fc#0 



Yl T'hk = Y {ek- Xj,k) 

= ek\{j : ruj^k 7^ 0}\ - ^ Xj,k 



>efc(|{i:m,-fc/0}|-l) 
where (20l follows from ([13]) and (2]_) follows from 



j:mJ,fc^O i=o 



From (19 1 and (21 1, we have 



(^ + 1) V" rnj^ki^j^k > efc(/i + l)(|{j : mj,A; / 0}| - 1) min ruj^k 
and this motivates our definition of allowable multiplicity types. 



(18) 



(19) 



(20) 



(21) 



(22) 
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Specifically, if we choose {mi /c,?Ti,2,fc, • • • ,"i^,A:} in the allowable set A{fi,i), defined in (12i, then 



by combining with (22 1, we obtain (18 1 and this completes the proof. 



Corollary 1: With the setting as in Lemma [T] the integer a in Condition |2] must satisfy a > fi. 
Proof: From (a + 1) [5m - |(K - 1)] > Cm and Sm < {a + l)(i^ - 1) in (|8|) and Q, we have 

(a + 1)5m - Cm > ^aia + l)iK- 1) 



and this implies that 



2Cm < ia + 2)SM- 



(23) 



But, Lemma [I] states that 2Cm > (/" + 1)5'm- Combining this with (23 1 gives a contradiction unless 
a > /i — 1. ■ 

In Condition [2| if we carefully design a distortion measure then for every a > /i, the first constraint ^ 
can be equivalently converted to the form where distortion is smaller than a fixed threshold. 

Theorem 1: Consider an {N,K) RS code and a MAS(fi,£) for "top-f decoding with multiplicity 
matrix M that only uses T multiplicity types in the allowable set A(p,i). Consider an arbitrary integer 
a> ji. Let 5a '■ X X X ^ M>o, where in this case X = Z^+i and X = I^t+i \ {0}, be a letter-by-letter 
distortion measure defined by 5a{x, x) = [Aa]^, £, where Aq is the {i + 1) xT matrij^n 



Aa 



Pl,a 



Pl,a 
2mi 



P2,a 



P2,a 

2mi,: 



2m2 1 2m,2 
Pl,a TT- P2,a T" 



1 2mf,i 2me,2 
\ Pl,a ^ P2,a ^ 



PT,a 



PT,a 
PT,a 



2m-i 1 



2m,2 



(24) 



pT,a — ) 



with 



Pk,a 



fj,{2a + l-fi) \r^ rrij^kimj^k + ^) 



1) ^ 



j=i 



a(a + 1) 



a{a+ 1) 
ior k = 1, . . . ,T. Then, the equation ([8]) in Condition |2] is equivalent to 

[N-K + l^Da, 



d(x^,x^)<^('" + '-^)- -■^- 



a(a + 1) 



and it is easy to verify that Dn = N — K + 1. 



^The first column of A^ is [ — ,0, —,—,..., ?i^]^ since multiplicity type 1 is always chosen to be (p, 0, 0, ... , 0). 
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Proof: First, we show that A^ consists of non-zero entries. It suffices to show that pk^a > ™"^''° for 
all j = 1, . . . , £ and /c = 1, . . . , T, i.e., 



li{2a + 1 - /x) + 2J mj'^kimj'^k + 1) > 2mj^k{a + 1) 



which is equivalent to 



2(a + l)(/i - mj^fc) + ^ rnj,^k{mj>^k + 1) - A^(a* + 1) > 0. 



(25) 



This is true since the left hand side of ( 25 1 is at least 



2{fj. + l){fi- rrij^k) + mj^kimj^k + 1) - /^(a* + 1) = (^ - "^i,A:)(/W + 1 - ruj^k) > 0. 

With the same Ck, Vj^k, Xj,k as defined in the proof of Lemma [T] and the chosen distortion matrix Aq, 
we have 

T 



fc=l \i=l 



2m 



'i,k 



Y. \Pk,aY.^i^k-2Y,^X, 



Xj,k + Pk,aXO,k 



k=l \ j=0 

T 



i=i 



2^ pk,aek-2}_^—^Xj, 
fc=i V j=i " 

|- 2/j n 2/i 2/i 2/j-ir 



Noting that the first column of A^ is always [^, 0, ^, ^, . . . , ^]^ and vi^ = ei - xi,i, we obtain 

a 



d{x^ ,x^) = —2^1,1 + ^ Pk,aek - 2 XI XI ^^^i.fc- 

fc=2 fc=2 i=l 

Next, one can see that ^ can be rewritten as 



(26) 



25*]^ T^ , 1 ^ 2Cm 
A + i > 



a a{a + 1) 

which, by substituting 5m and Cm in (16 1 and ( [T7] ), is equivalent to 



2ii 



T 



T 






a \ ^-^ ] ^-^ ^-^ a aia + l) \ ^-^ ] ^-^ ^-^ a(a + 1 



fc=2 / k=2 j=l 

Equivalently, this gives 



fc=2 / k=2 



'^ a(a + l) 



2/i ^(^ + 1; 



2/i ^(^ + 1) v^ rrij^ki'mj^k + 1) 



i/i ^(^ + ij\ j^ , ^ ^ 2^ oY^Y^^^i''^ , Y^ M/^ P[P + ^) ,sr^ rnj^k[mj,k + 

7 — riTF^~^+-^> — ^i.i~2 > > Xj,k+} ek\ 7 — r-jT + Z^ ? — TIT 

a a{a + l)J a ^^ « ^ \« «(« + 1) p^ M^ + 1) 
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which in turn is equivalent to 

T T i 

, , ,. N -K + l> ^1/1,1 + > ekPk,a - l^l^ ^j,kXj,k 



(27) 



fc=2 



fc=2 i=l 



Finally, combining (p6|) and ( 27 1 gives the proof. 



Example 6: Consider mASD-2 for a = ^ = 2. In this case, the distortion matrix is 

^2 5/3 2 l\ 

A = 



(28) 



2/3 2 1 

\^ 2 2/3 1 y 

However, Condition |2] also requires the second constraint Q to be satisfied. In addition, we need to 
choose an integer a > /i in order to apply our proposed approach. Therefore, we first consider the case 
of high-rate RS codes where if a = /U then the satisfaction of Q also implies the satisfaction of Q. For 
the case of lower-rate RS codes, we obtain a range of a and also propose a heuristic method to choose 
an appropriate a. 

1) High-rate Reed-Solomon codes: In this subsection, we focus on high-rate RS codes which are 
usually seen in many practical apphcations. The high-rate constraint allows us to see that a = /i is 
essentially the correct choice. 

Lemma 2: Consider an (A^, K) RS code with rate 

K 1 n 

— > \ ^— . 

N - N n+l 

If equation (|8]l is satisfied for a = /x, or equivalently, 

d{x^,x^) <N-K + l 

under the distortion measure A^ then whole Condition |2] is satisfied and the transmitted codeword will 
be therefore on the list. 

Proof: Suppose ([8]) is satisfied for a = fi, i.e.. 



S'm > 



Cm 

/i + 1 



M 



(K-l) 



We will show that 



f,{K - 1) < 5m 

<{^ + l){K-l) 

and, therefore, both ^ and Q in Condition |2] are satisfied for a = jjl. 



(29) 

(30) 
(31) 
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Firstly, using Lemma [T] we have 



2 /U + 1 



and consequently, (BOb is implied by ( 29 1 since 



¥^«"-^>^(^->)- 



Secondly, note that ( 3 1 1 holds since 



/ T \ T £ 

\ k=2 / k=2j=l 

re T / e 

k=2j=0 k=2 \ i=l 



<(/x+l)(i^-l) 



(32) 
(33) 



where ( 32 1 is obtained by dropping non-negative terms and ( 33 1 follows from the high-rate constraint 



K-l > _JJ_ 

Finally, by Theorem [1] one can verify that equation ([8]l with a = ^u is equivalent to 

d{x^,x^) <D^ = N -K + 1 

under the distortion measure A^. ■ 

However, there are possibly other integers a ^ ji that can also satisfy Condition [2] If we consider higher- 
rate RS codes, as in the following theorem, then we can claim that a = /x is the only such integer. 



Theorem 2: Consider an [N, K) RS code with rate 



K 1 



//(// + 3) 



N - N (/i + l)(/i + 2)' 
The integer a in Condition [2] must satisfy a = j^l and, consequently, the set of constraints Q and 
Condition |2] is equivalent to 

d{x^,x^) <N-K + l 

under the distortion measure A„. 



m 



Proof: We first see that 



ia + 1) 



Sm-^{K-1] 



> Cm 
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in ([8]l implies 



5M-"(i^-l)>% 
2 a + 1 



and, with the score Sm and the cost Cm computed in ( 16 1 and ( 17 1, we obtain 



(T \ T e 

fe=2 / k=2j=l 

/x(/i+l) f ^^\ , ^^ ^ rrij^kimj^k + I) 

This gives 



-^j iV- ^(E: - 1) > /ii/i,i + J] ^ i/j- fc + J^^efc L- J] m,- fe + J]; 



^ '^^ i=2i=i fc=2 \ j=i i=i ^ ^ ^ 



T / £ \ 

fc=2 \ j=l / 

> (35) 



where (34i is obtained by dropping non-negative terms. 



Combining this inequality with the high-rate constraint implies that 

\i{2a + 1 - ;u) K-\ ^ fi{n + 3) 



a(a + l) N - (^ + l)(/i + 2) 

which leads to a < /z + 1, i.e. a < /x. 

This, together with a > fi according to Corollary [T] leave a = /i as the only possible choice. Finally, 
by seeing that 

J^^ 1 M^ + 3) 1 /i 

TV - iV (;U + l)(/i + 2) Af ;u + l 

and applying Lemma [2] we conclude the proof. ■ 

Corollary 2: When the RD approach is used, R{D) is positive for -Dmin < D < -Dmax and is zero for 
D > -Dmax- Computing -Dmax reveals how good the distortion measure matrix is at rates close to zero 
(i.e., the erasure codebook has only one entry). For mASD-^, 

TV r e 

^max(inASD-/i) = X , i^lil^ { 2(1 - Pi,l),Pk,fj. - X ~^^PiJ 
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TABLE I 

Example ranges of possible a 




RS(255,191) 


RS(255,127) 


11 = 2 


2 < a< 3 


2 < a < 6 


M = 3 


3 < a< 5 


3< a<9 



while for mBM- 



N 



Di„ax(mBM-^) = Y, mMl, 2(1 - p^,l)}. 



i=l 



Moreover, if mASD-/i uses multiplicity type (0,0, ...0) then Dinax(inASD-;u) < Dmax(inBM-£) for 
every /x,£. 

Proof: See Appendix [A] ■ 



Example 7: Consider mASD-2 with distortion matrix in (28 1. We have 



N . 

Z)n,ax(mASD-2) = J]min j 1,2(1 -po), 



\{Pi,l +Pi,2) 



which is less than or equal to Dinax(mBM-^) for every l. This fact can be seen in Fig. B] which is obtained 
by simulation. This also predicts that, as expected, ASD decoding will be superior when R is small. 

2) Lower-rate Reed-Solomon codes: Without the high-rate constraint as in Theorem |2| we may not 
have a = fi. However, we can obtain a range for a and heuristically choose the integer a that potentially 
give the highest rate-distortion exponent. After that, we can also apply the algorithms proposed in Section 
II-C with the corresponding distortion measure A^ and distortion threshold Da derived in Theorem [T] 

The following lemma tells us the range of possible a. 

Lemma 3: Consider an (N,K) RS code. In order to satisfy ([8]l, one must have 

fiO - 1/2 + V/^20 (^ - 1) + 1/4 



fj, < a < 



where 6 



N 
K-1- 



Proof: First note that (35) holds for any {N,K). Therefore, we have 



M^+1) ^a{K-l) 



2(a + l) ' 2N 

Combining this with a > /U in Corollary [T] we obtain the stated result. ■ 

Example 8: Table |l] gives several example ranges of possible a for some choices of p, and RS codes. 
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Among possible choices of a, we are interested in choosing a that gives the largest rate-distortion 
exponent and therefore has a better chance to satisfy Condition |2] The following lemma can give us an 
insight of how to choose such an integer a. 

Lemma 4: If 

a > ^ (^1 + 4^/^(^ + 1) - 3) (36) 

where 9 = j^^ then starting from a, the rate-distortion exponent Fa strictly decreases until reaching 
zero, i.e., Fa > -Fa+i > Fa+2 > . . . > if rate R is fixed. 

Proof: For a fixed rate R, the distortion measure A^+i and distortion Da+i yield exponent Fa+i- 
Scaling both Aa+i and Da+i leaves Fa+i unchanged. Hence, ^ilA^+i and ^^Da+i also yield Fa+i- 
Next, we will show that 

"^A„+i > A,. (37) 



To prove (37 1, it suffices to show 



a+1 

Pk,a+1 > Pk,a (38) 



since 



a + 1 f 2mj^k \ . 2mj,fc 

Pk,a+i r^ > Pk,a 

a \ a+l J a 



is also equivalent to (38 1. 



Equivalently, we need to show 

e 

p{p + 1) > ^ rrij^kimj^k + 1) 
i=i 

which is true because fi > X]i=i ^-j.fc by the definition of allowable multiplicity types. 

Thus, (37 1 holds and, therefore, the exponent yielded by A^ and ^^Da+i is at least Fa+i- From (36 1 



we have 



a[a + 1) 
M2a + 3-M) ^_a+^ 

a(a + 2) a ^ ' 

_« + in 

a 



Since for a fixed R, exponent F is increasing in distortion D 11241 Thm 6.6.2], we know that Fa > Fa+i 
where Fa is the exponent yielded by A^ and Da- ■ 

[h] 
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TABLE n 

Example ranges of a that gives the largest exponent 





RS(255,191) 


RS(255,127) 


M = 2 


a = 2 


a£ {2,3} 


M = 3 


a = 3 


ae {3,4} 


^ = 12 


ae {12,13} 


12 < a< 17 



•- ;U = 2, ^ft/A^o = 6.0 ciB 
^fi = 3, Eb/No = 6.0 dB 
2, ^i/A^o = 6.5 dB 
.5dB 




Fig. 2. Plot of exponent Fa versus a foi fi — 2 and /j. — 3 with a fixed rate R = 6. Simulations are conducted for the 
(255,127) RS code using BPSK over an AWGN channel at Eb/No = 6.0 dB and 6.5 dB. 



Corollary 3: The integer a that gives the largest exponent lies in the range 

1 



fi < a < 



Vl + 40/u(;U + 1) - 3 



+ 1. 



Example 9: The following Table [ll| presents several example ranges of a that gives the largest exponent 
for some choices of fi and RS codes. 

Remark 9: Simulation results also confirm our analysis. For example, in Fig. [2| a = 3 and a = 4 give 
roughly same and the largest exponents for /i = 3 while a = 2 yields the largest exponent for /i = 2. In 
fact, simulation results suggest that, typically, either a = /xora = /x + l gives the best exponent. 

In Condition |2j for lower-rate RS codes, so far we have only paid attention to ([8]l. However, it is also 



November 9, 2010 



DRAFT 



29 



required that 



or equivalently 



a(K-l)<5M<(a + l)(K-l), 
a + 1 



(39) 



K -I 

While it is hard to tell exactly which a will satisfy (|39]) with high probability right away, we can propose 
a heuristic method to choose the integer a that is likely to work. We first need the following lemma. 
Lemma 5: Suppose we have obtained a test-channel input-probability distribution matrix Q (e.g., 



during Step 2a or Step 2b in the proposed algorithms in Section II-C I and the set of erasure patterns 



(40) 



for mASD is generated independently and randomly according to Q. Then, the expected score can be 

computed as follows: 

T e N 

1E[5m] = XI XI XI ^J,kPi,jQi,k- 

k=l j=l i=l 
Proof: The proof follows from the following equations: 

T e 

X X '^J'kXj,k 



ns, 



M 



E 



fc=i j=i 



(41) 



X,=k} 



^^^J,kHXj,k] 

fc=l i=l 

T e r N 

XX"^J'fc^ X^{x.=i, 
fc=i i=i Li=i 

TIN 

XXX "^^> p^(^^ = ^'' ^i = ^) 

fc=l j=l i=l 

TEN 

XXX"^j>p^j^^'^ 

fe=l j=l i=l 



where (|4T]l is implied by ([T4|). ■ 

Next, we propose a heuristic method to find the appropriate integer a to work with as follows. 
Algorithm 3: 

• Step 1: Start with a = fi, using distortion measure A^ and distortion threshold Da to get the 
corresponding distribution matrix Q as discussed above. 

• Step 2: Compute the expected score E[S'm] using (40 1. If ^-_^^ = a + 1 then output a and stop. 
If not set a -^ a + 1 and return to Step 1. 
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k^ 




Fig. 3. Plot of exponent Fa versus a for /i = 10 with a fixed rate R = 6. The set of multiplicity types considered is the 
relaxed set ^o(10, 2). Simulations are conducted for the (458,410) RS code over F210 using BPSK over an AWGN channel at 
Eb/No = 6.0 dB and 6.5 dB. 



Remark 10: In simulations with small to moderate /i, it is usually found that a is either ^ or ^ + 1. 
Typically, -^^ > ^ and a unit increase of a produces a small increase in j^"^ ■ 



Remark 11: So far, we have considered only the allowable multiplicity types in Definition |7] It is 
possible to obtain better performance if we relax some constraints and allow multiplicity types to be in 
the relaxed set 

Ao{^i,i) = ^{mi, 7712,..., mi) Yfj=i^j < /^ I • 

In this case, some theoretical results, e.g., results in Lemma 1 and Theorem 2, do not hold. However, 
this modification combined with the heuristic method above can improve the decoding performance, 
especially with large ^u. Specifically, we consider mASDo-/U which denotes our proposed multiple ASD 
decoding algorithm that only uses multiplicity types (0, 0) and(?7ii, 7712) of the form mi + m2 = /x. These 
multiplicity types form a subset of Aq{^i, 2). The choice of £ = 2 is suggested by observations that top-2 
decoding performs almost as good as top-£ decoding for I > 2. The integer a used in mASDo-^ is found 
through the heuristic method. In Fig. |3] simulations are conducted for the (458,410) RS code using BPSK 
over an AWGN channel. For ji = 10, it can again be observed that a = /i gives the best exponent. More 
simulation results of this heuristic method can be seen in Section IVIII 
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V. Closed-Form Analysis of RD and RDE Functions for Some Distortion Measures 

A. Closed-form RD function 

For some simple distortion measures, we can compute the RD functions analytically in closed form. 
First, we observe an error pattern as a sequence of i.n.d. random source components. Then, we compute 
the component RD functions at each index of the sequence and use convex optimization techniques to 
allocate the total rate and distortion to various components. This method converges to the solution faster 



than the numerical method in Section III The following two theorems describe how to compute the RD 
functions for the simple distortion measures of Proposition [T] and [3] 

Lemma 6: Consider a binary source X where Pt{X = I) = p and Ft(X = 0) = I — p . With the 
distortion measure in ^, the rate-distortion function for this source iqj 

R{D) = [Hip)-HiD + p-l)]+. 

Proof: See Appendix |B] ■ 

Theorem 3: (Conventional errors-and-erasures "mBM-1" decoding) Let pj i = Pr(Xj = 1) for i = 

1, . . . ,N. The overall rate-distortion function is given by 

^ + 

R{D) = Y^ [i7(p,,i) - F(A)] 

i=l 

where Di = Di + pi^i — 1 and Di can be found be a reverse water-filling procedure (see EOl Theorem 
13.3.3]): 

A if A < min{pj,i, 1 - pj^i} 

minjpj^i, 1 — Pi,i} otherwise 
where A should be chosen so that 

N N 

Y,bi = D + Y,Pi^i-N. 
The R{D) function can be achieved by the test-channel input-probability distribution 



Di 



and 



qifl ^ Pr(Xi = 0) 



(?,,i ^ Pr(X, = 1) 



1-2A 

Pi,i - Dj 
1 - 2 A 



Proof: See Appendix |C] 

^The binary entropy function is H(u) = —u log it — (1 — u) log(l — u). 
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Theorem 4: (Bit-level ASD "m-bASD" decoding) Let rj,i = Pr(5j = 1) and n^ = Fi{Bi = 0) for 
i = 1, . . . , n. The overall rate-distortion function in m-bASD scheme is given by 



R{D) = ^[R,{X)]- 
where 



R,{X) = F(r,i) - H ( , 't\. ) + (r^^ " . ' t \. ) ^ 



I?i 



,i + A + A2y V ' i + a + aV Vi + a. 

and the distortion component Di is given by 

1+A+A2 '«,1 1+A 1^ -rtil^J ^ U 

min{ 1 , 3 ( 1 — r j^ 1 ) } otherwise 

where A G (0, 1) should be chosen so that Y^^=i ^i — ^- The R{D) function can be achieved by the 

following test-channel input-probability distribution 

A p. A .^ (l + A)-r,,i(l + A + A2) 
Sifi = Fr[Bi = 0) = 1- X^ 

and 

.u * Pr(B, = 1) = '•-'"^^t^V"'^" - 

1 — A^ 

Sketch of proof : With the distortion measure in (J3]), using the method in ll26l Chapter 2] we can 

compute the rate-distortion function components 

«-<^-> - «<-' - " (r^^) + (-■' - T^^) " (ita; 

where Aj is a Lagrange multiplier such that 

1 + 2Ai + 3A2 1 + 2Ai 



A = , . . . .o' - ri,i 



1 + Ai + A2 • "^ 1 + Ai 
for each bit index i. Then, the Kuhn-Tucker conditions define the overall rate allocation using the similar 
argument as in the proof of Theorem |3] ■ 

B. Closed-form RDE function 

In this subsection, we consider the case mBM-1 whose distortion measure is given in Q. We study the 
setup that RS codewords defined over Galois field F^ are transmitted over the ?7i-ary symmetric channel 
(m-SC) which for each parameter p can be modeled as 

_ p if r = c 

Pr(r|c) 

[1 — p)/[m — 1) if r 7^ c 
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Here, c (resp. r) is the transmitted (resp. received) symbol and r, c S F^- For this channel model, 
we restrict our attention to the range of p where the received symbol is the most-likely (i.e., p > 
(1 — p)/{m — 1)). Therefore, at each index i of the codeword, the hard-decision is also the received 
symbol and then it is correct with probability p. Thus, we have pj i = Pr(Xj = 1) = p for every index i 
of the error pattern x^. That means, in this context we have a source x^ with i.i.d. binary components 
Xi. Since the components Xj's are i.i.d, we can treat each Xi as a binary source X with Pr(X = 1) = p 
and first compute the RDE function for this source X as given by an analysis in Appendix [Pj Based on 
this analysis, we obtain the following lemmas and theorems for the mBM-1 decoding algorithm of RS 
codes over an m-SC channel. 

Lemma 7: Let h{u) = H{u) — H{u + D — 1) map n G [l — Z?, 1 — ^) to i?. Then, the inverse 
mapping of h, 

h~':{0,Hil-D)]^ 1-D,1-^ 

is well-defined and maps R to u. 

Proof: h{u) is strictly decreasing since the derivative is negative over [l — D, 1 — ^)- Hence, the 

mapping h : [l — L>, 1 — y) — )• (0, H{1 — D)] is one-to-one. From the analysis in Appendix D one can 

also see that h is onto. ■ 

Theorem 5: Using mBM-1 with 2^ decoding attempts where R G {Q,NH{\ — ^)], the maximum 

rate-distortion exponent that can be achieved iqj 

R~ 



F = NDKL[h-\ ^ 



(42) 



Proof: First, note that in our context where we have a source sequence x^ of N i.i.d. source 
components, the rate and exponent for each source component are now ^ and ^. From Case 3 in 
Appendix |D] and from Lemma [7] we have 



l^ = DKL{u\\p) = DKL(h-'(^ 



P 



and the theorem follows. ■ 

Lemma 8: Let g{u) = Dkl{u\\p) map u S [1 — D,p] to F. Then, the inverse mapping of g, 

g-':[0,DKLil-D\\p)]^[l-D,p] 

is well-defined and maps F to u. 

'The KuUback-Leibler divergence is Dkl{u\\p) = ulog - + (1 — it) log j^- 
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■mBM-l(RDE,ll) 
■o- Approximation 



0.01 

p 



0.015 



0.02 



Fig. 4. Performance of mBM-l(RDE,ll) and its approximation 2 ^ wiiere F is given in i42i for the (255,239) RS code over 
an m-SC(p) channel. 



Proof: We first see that g{u) is a strictly convex function and achieves minimum value at u = p and 
therefore (/(u) is strictly decreasing over [1—D,p]. Thus, the mapping gi : [1—D,p\ — )■ [0,Dkl{^—D\\p)] 
is one-to-one. From the analysis in Appendix [Dj one can also see that g is onto. ■ 

Theorem 6: In order to achieve a rate-distortion exponent of F G [0, A Dkl (1 ~ ^ 1 1 p)]' the mini- 
mum number of decoding attempts required for mBM- 1 is 2^ where 

D 



R = N 



H[g 



-1 



))-(-' 



+ 



A 



Proof: We also note that the rate, distortion and exponent for each source component are 



^ ^ and 

AT' TV ^"<^ 



respectively. Combining all the cases in Appendix D we have 



R 

N 



^l»-(^ 



H[g 



-1 



D 

A 



1 



and the theorem follows. ■ 

Remark 12: In Fig. |4| we simulate the performance of mBM-l(RDE,ll) for the (255,239) RS code 

over an m-SC channel. One curve reflects the simulated frame-error rate (FER) and the other is the 



approximation derived from 2 ^ where F is given in (42) with R= 11. 
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VI. Some Extensions 
A. Erasure patterns using covering codes 

The RD framework we use is most suitable when N ^- oo. For a finite N, choosing random codes 
for only a few LRPs can be risky. We can instead use good covering codes to handle these LRPs. In the 
scope of covering problems, one can use an £-ary tc-covering code (e.g., a perfect Hamming or Golay 
code) with covering radius tc to cover the whole space of £-ary vectors of the same length. The covering 
may still work well if the distortion measure is close to, but not exactly equal to the Hamming distortion. 
The method of using covering codes in the LRPs was proposed earlier in |36] to choose the test patterns 
in iterative bounded distance decoding algorithms for binary linear block codes. 

In order take care of up to the i most likely symbols at each of the ric LRPs of an {N, K) RS, we 
consider an {tic, ^c) ^-ary ic-covering code whose codeword alphabet is Z^+i \ {0} = {1,2,...,^}. Then, 
we give a definition of the (generalized) error patterns and erasure patterns for this case. In order to draw 
similarities between this case and the previous cases, we still use the terminology "generalized erasure 
pattern" and shorten it to erasure pattern even if errors-only decoding is used. For errors-only decoding. 
Condition [T] for successful decoding becomes 

Definition 8: (Error patterns and erasure patterns for errors-only decoding) Let us define x^ S Z^i 
as an error pattern where, at index i, xi = j implies that the j-th most likely symbol is correct for j G 
{1,2,...^}, and Xj = implies none of the first £ most likely symbols is correct. Let x^ G {1,2,. ..,£}^ 
be an erasure pattern where, at index i, Xi = j implies that the j-th most likely symbol is chosen as the 
hard-decision symbol for j G {1, 2, . . . ,£}. 

Proposition 4: If we choose the letter-by-letter distortion measure 5 : Z^+i x Z^+i \ {0} — )■ ]R>o 
defined by 5{x, x) = [A]^,.^^ in terms of the (^ + 1) x £ matrix 

(l 1 ... l\ 



A 



1 

1 



(43) 



VI 1 ... oy 

then the condition for successful errors-only decoding then becomes 

1 



N -Af\ 



a[x ,x 



<-,iN 



K + 1). 



(44) 
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Proof: It follows directly from 

I i 
d{x^, £^) = ^ ^ Xj,k = ^■ 
k=lj=0,j^k 

■ 
Remark 13: If we delete the first row which corresponds to the case where none of the first £ most 
likely symbols is correct then the distortion measure is exactly the Hamming distortion. 

Split covering approach: We can break an error pattern x^ into two sub-error patterns x^^^'^ = 
^a{i)^a(2) ■ ■ ■Xa{n^) of '^c Icast reliable positions and x^^^^^ = ^^-(n^+i) • • -Xct^n) oi N — Uc most 
reliable positions. Similarly, we can break an erasure pattern x^ into two sub-erasure patterns x^^P^ = 
^cr{i)^u(2) ■ ■ ■ Xa(n,) ^nd x'^^^^'^ = Xo-(„^+i) • • • x„(^]^y Let Zn^ bc the number of positions in the Uc LRPs 
where none of the first i most likely symbols is correct, or 

Zn, = \{i = 1,2,..., He : x^(i) =0}| . 

If we assign the set of all sub-error patterns x^^p^ to be an {uc, kc) tc-covering code then 

because this covering code has covering radius tc- Since 



in order to increase the probability that the condition (44i is satisfied we want to make d{x^^^^'^ ,x^^^^' 



as small as possible by the use of the RD approach. The following proposition summarizes how to generate 
a set of 2^ erasure patterns for multiple runs of errors-only decoding. 

Proposition 5: In each erasure pattern, the letter sequence at iic LRPs is set to be a codeword of 
an {uc, kc) ^-ary tc— covering code. The letter sequence of the remaining N — ric MRPs is generated 



randomly by the RD method (see Section II-C I with rate Rmrps = R — kc log2 ^ and the distortion 
measure in (43 1. Since this covering code has £''■= codewords, the total rate is Rmrps + log2^'^'' = ^• 

Example 10: For a (7,4,3) binary Hamming code which has covering radius tc = 1, we take care of 
the 2 most likely symbols at each of the 7 LRPs. We see that 1001001 is a codeword of this Hamming 
code and then form erasure patterns lOOlOOlxgxg . . .Xn with assumption that the positions are written 
in increasing reliability order. The 2^~^ sub-erasure patterns xgxg . . .Xn are generated randomly using 
the RD approach with rate (i? — 4). 

Remark 14: While it also makes sense to use a covering codes for the Uc LRPs of the erasure patterns 
and set the rest to be letter 1 (i.e., chose the most likely symbol as the hard-decision), our simulation 
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results shows that the performance can usually be improved by using a combination of a covering code 



and a random (i.e., generated by the RD approach) code. More discussions are presented in Section VII 



B. A single decoding attempt 

In this subsection, we investigate a special case of our proposed RDE framework when i? = (i.e., 
the set of erasure patterns consists of one pattern). In this case, our proposed approach is related to 
another line of work where one tries to design a good erasure pattern for a single BM decoding or a 
good multipUcity matrix for a single ASD decoding lEl, IHl, Ull, HH. We will see that the RDE 
approach for i? = is quite similar to optimizing a Chernoff bound llT6l . lITSl or using the method 
of types lITTll . The main difference is that this approach starts from Condition |2] rather than its large 
multiplicity approximation. 

Lemma 9: When rate i? = 0, the distribution matrix Q that optimizes the RDE/RD function consists 
of only binary entries. Consequently, the random codebook using the proposed RDE approach (the set 
of erasure patterns) becomes a single deterministic pattern. 

Sketch of proof: For each (s,t) pair, the total rate is the sum of N individual components as seen 
in Proposition |2] Therefore, the zero total rate implies all components are zero. Thus, it suffices to show 
that if an arbitrary rate component (denoted as R in the proof) is zero then the corresponding column of 
Q has all entries equal to or 1. 

For the RD case, it is well known ll26l p. 27] that if i? = then the distortion is given by Dmax = 
minfc "^jPjSjk where k* is the argument that achieves this minimum and the test-channel input distri- 
bution is 

1 if k = k* 

otherwise 

Computing the RDE for the source distribution pj is equivalent to solving the RD problem for an 
appropriately tilted source distribution p*. Therefore, the above property is inherited by the RDE as well. 
In particular, the distortion at i? = is given by minfc J2jP*j^jk ^^d the test-channel input distribution 
is supported on the singleton element that achieves this minimum. 

This result can also be shown directly by solving Q while dropping the rate constraint from (|7]). ■ 
Let Gk{D) be the large deviation rate-function for the distortion when the reconstruction symbol is fixed 
to k. It is well-known that this can be computed using either a Chernoff bound or the method of types 
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201 . Both techniques result in the same function; for a > 0, it is described impUcitly by 



D{a) - ^ c 



2"'5j,fc 



Theorem 7: The RDE function for i? = is equal to 

F(0,D) = maxGfe(L>). 

k 

Proof: Lemma |9] shows that the reconstruction distribution must be supported on a single element. 
Since the exponential failure probability for any fixed reconstruction symbol follows from a standard 
large-deviations analysis, the only remaining degree of freedom is which symbol to use. Choosing the 
best symbol maximizes the RDE. ■ 

Remark 15: This means that the single decoding attempt with the best error-exponent can be computed 
as a special case of the RDE approach. Simplifying our proposed algorithm to use the single Lagrange 
multiplier a leads to an algorithm that is very similar to the one proposed in lITTll . It also seems unlikely 
that this new algorithm wiU provide any significant performance gains either in performance or complexity. 

VII. Simulation results 

In this section, we present simulation results on the performance of RS codes over an AWGN channel 
with either BPSK or 256-QAM as the modulation format. In all the figures, the curve labeled mBM-1 
corresponds to standard errors-and-erasures BM decoding with multiple erasure patterns. For £ > 1, 
the curves labeled mBM-^ correspond to errors-and-erasures BM decoding with multiple decoding trials 
using both erasures and the top-^ symbols. The curves labeled mASD-^u correspond to multiple ASD 
decoding trials with maximum multiplicity /i. The number of decoding attempts is 2^ where R is denoted 
in parentheses in each algorithm's acronym (e.g., mBM-2(RD,ll) uses the RD approach with R = 11 
while mBM-2(RDE,10) uses the RDE approach with R = 10). Please note that not all the algorithms 
listed in this section are of the same complexity unless stated explicitly. 

In Fig. Bl the RD curves are shown for various algorithms using the RD approach at Eh/No = 5.2 dB 
where BPSK is used. For the (255,239) RS code, the fixed threshold for decoding is D = N-K +1 = 17. 
Therefore, one might expect that algorithms whose average distortion is less than 17 should have a frame 
error rate (FER) less than |. The RD curve allows one to estimate the number of decoding patterns 
required to achieve this FER. Notice that the mBM-1 algorithm at rate 0, which is very similar to 
conventional BM decoding, has an expected distortion of roughly 24. For this reason, the FER for 
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Fig. 5. A realization of RD curves at Ei,/No = 5.2 dB for various decoding algorithms for the (255,239) RS code over an 
AWGN channel. 



conventional decoding is close to 1. The RD curve tells us that trying roughly 2^^ (i.e., R = 16) erasure 
patterns would reduce the FER to roughly ^ because this is where the distortion drops down to 17. 
Likewise, the mBM-2 algorithm using rate R = 11 has an expected distortion of less than 14. So we 
expect (and our simulations confirm) that the FER should be less than ^. 

One weakness of this RD approach is that RD describes only the average distortion and does not 
directly consider the probability that the distortion is greater than 17. Still, we can make the following 
observations from the RD curve. Even at high rates (e.g., R > 5), we see that the distortion D achieved 
by mBM-2 is roughly the same as mBM-3, mASD-2, and mASD-3 but smaller than mASD-2a (see 
Example |4]) and mBM-1. This implies that, for this RS code, mBM-2 using the RD approach is no worse 
than the more complicated ASD based approaches for a wide range of rates (i.e., 5 < R < 35). This is 
also true if the RDE approach is used as can be seen in Fig. [6] which depicts the trade-off between rate 
R and exponent F for various algorithms at Eh/No = 6 dB. For this RS code, ASD based approaches 
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Fig. 6. A realization of RDE curves at Eb/No 
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dB for various decoding algorithms for the (255,239) RS code over an 



have a better exponent than mBM-2 at low rates (i.e., small number of decoding trials) and have roughly 
the same exponent for rates R> 5. 

In Fig. It] a plot of the FER versus Eh/No is shown for the (255,239) RS code over an AWGN channel 
with BPSK as the modulation format. The conventional HDD and the GMD algorithms have modest 
performance since they use only one or a few decoding attempts. Choosing R = II allows us to make 
fair comparisons with SED(12,12). With the same number of decoding trials, mBM-2(RD,l 1) outperforms 
SED(12,12) by 0.3 dB at FER= 10^^. Even mBM-2(RD,7), with many fewer decoding trials, outperforms 
both SED(12,12) and the KV algorithm with fi = oo. Among all our proposed algorithms using the RD 
approach with rate R = 11, the mBM2-HM74(RD,ll) achieves the best performance. This algorithm 
uses the Hamming (7,4) covering code for the 7 LRPs and the RD approach for the remaining codeword 
positions. Meanwhile, small differences in the performance among mBM-2(RD,ll), mBM-3(RD,ll), 
mASD-2(RD,ll), and mASD-3(RD,ll) suggest that: (i) taking care of the 2 most likely symbols at each 
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Fig. 7. Performance of various decoding algoritlims for the (255,239) RS code using BPSK over an AWGN channel. 

codeword position is good enough for multiple decoding of this RS code and (ii) multiple runs of errors- 
and-erasures decoding is generally almost as good as multiple runs of ASD decoding. Recall that this 
result is also correctly predicted by the RD analysis. When the RDE approach is used, mBM-2(RDE,ll) 
still has roughly the same performance as a more complex mASD-3(RDE,ll). One can also observe 
that these two algorithms using the RDE approach achieve better performance than mBM-2(RD,ll) and 
mBM2-HM74(RD,ll) that use the RD approach. We also simulate our proposed algorithm at i? = log2 9 
to compare with the GMD algorithm. While both mBM-2(RDE,log2 9) and the GMD algorithm use the 
same number of 9 errors-and-erasures decoding attempts, mBM-2(RDE,log2 9) yields roughly a 0.1 dB 
gain. The simulation results show that, at this low rate R = log2 9, mASD-3 has a larger gain over 
mBM-2 than at a higher rate R = 11. This phenomenon can be predicted in Fig. [6] where mASD-3 starts 
to achieve a larger exponent F at small values of R. 

To compare with the Chase-type approach (LCC) used in [9|, in Fig. [7] we also consider the mBM2- 
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Fig. 8. Performance of various decoding algoritfims for the (255,239) RS code using 256-QAM over an AWGN channel. 

HM74(4) algorithm that uses the Hamming (7,4) covering code for the 7 LRPs and the hard decision 
pattern for the remaining codeword positions. This shows that, for the (255,239) RS code, the mBM2- 
HM74 achieves better performance than the LCC(4) with the same number (2^) of decoding attempts. 
For the (458,410) RS code considered in Fig. |9] one can also observe that the group of algorithms that 
we propose have better performance than LCC(IO) with the same number (2^*^) of decoding attempts. 
However, the implementation complexity of LCC(IO) may be lower than the algorithms proposed here 
due to their clever techniques that reduce the decoding complexity per trial. It is also interesting to 
note that the method proposed here, based on covering codes and random codebook generation, is also 
compatible with some of the fast techniques used by the LCC decoding. 

We also performed simulations using QAM and Fig. [S] shows FER versus Eh/No performance of the 
same (255,239) RS code transmitted over an AWGN channel with 256-QAM modulation. At FER^IO-"^, 
our proposed algorithms mBM-2(RD,10) and mBM-2(RDE,10) achieve 0.3-0.4 dB gain over SED(11,10) 
(with the same complexity) and also outperform KV(/i = oo). At R = 10, mBM-2 still achieves roughly 
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Fig. 9. Performance of various decoding algoritfims for the (458,410) RS code over F210 using BPSK over an AWGN channel. 



the same performance as mASD-3. 

In Fig. |9j a plot of the FER versus Eh/No is shown for the (458,410) RS code that has a longer block 
length. In this plot, BPSK is used as the modulation format and we also focus on rate R = 10. With 
algorithms that use the RD approach, mBM-2(RD,10) still has approximately the same performance as 
mBM-3(RD,10), mASD-2(RD,10), mASD-3(RD,10). However, when the RDE approach is employed, 
algorithms that run multiple ASD decoding attempts have a recognizable gain over algorithms that use 
multiple runs of BM decoding. The performance gain of the RDE approach (over the RD approach) is 
small, but can be seen easily by comparing mASD-3(RDE,10) to mASD-3(RD,10). As a reference, we 
also plot the performance of KV(4.99) which corresponds to the proportional KV algorithm ||32]| with 
the scaling factor 4.99. 

In Fig. [lOJ the same setting is used as in Fig. |9] As can be seen in the figure, KV(;U = 00) achieve 
better performance than mASD-3(RDE,10) and mBM-2(RDE,10). However, by considering higher //, 
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Fig. 10. Performance of various decoding algorithms for the (458,410) RS code over F210 using BPSK over an AWGN channel. 



our algorithms using the heuristic method mASDo-10(RDE,10) and mASDo-20(RDE,10) can outperform 
KV(/x = 00). 

To target RS codes of lower rate, we also ran simulations of the (255,127) RS code over an AWGN 
channel with BPSK modulation and the results can be seen in Fig. [TT] While mBM-2(RDE,6), mBM- 
2(RD,6), SED(7,6) and GMD all use the same number of about 64 errors-and-erasures decoding attempts, 
our proposed mBM-2 algorithms outperforms the other two algorithms. As seen in the plot, mASD- 
3(RDE,6) has quite a large gain over mBM-2(RD,6) which is reasonable since ASD decoding is known 
to perform very well compared to BM decoding with low-rate RS codes. In this figure, KV(3.99) denotes 
the proportional KV algorithm 1321 with the scaling factor 3.99 and therefore with maximum multiplicity 
/i = 3. While mASD-3(RDE,6) with 64 decoding attempts outperforms KV(3.99) as expected, the small 
gain of roughly 0.5 dB at FER=10^^ suggests that with low-rate RS codes, one might prefer increasing 
fj, in a. single ASD decoding attempt to running multiple ASD decoding attempts of a lower /i. 



In Fig. 12 we show the FER versus Es/Nq performance for the (255,191) RS codes using 256-QAM. 
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Fig. 11. Performance of various decoding algorithms for the (255,127) RS code using BPSK over an AWGN channel. 



Again, our proposed algorithm niBM-2(RDE,5) performs favorably compared to SED(6,6) and GMD with 
the same number of about 32 errors-and-erasures decoding attempts. Under this setup, mASD-2(RDE,5) 
and mASD-3(RDE,5) achieve significant gains over mBM-2(RDE,5). Our proposed mASD-3(RDE,ll) 
and mASD-3(RDE,5) algorithms have fairly the same performance as the proportional KV algorithm 
with the scaling factor 12.99 and 6.99, respectively. 

To compare with the iterative erasure and error decoding (lEED) algorithm proposed in [8^|, we also 
conducted simulations of the (255,223) RS code over an AWGN channel using BPSK and the results 



are shown in Fig. 13 With the same number of about 17 errors-and-erasures decoding attempts, our 
proposed mBM-2(RDE,log2 17) algorithm outperforms both the GMD and 17-IEED algorithms. In fact, 
at FER smaller than 10^^, mBM-2(RDE,log2 17) has roughly the same performance as 32-IEED which 
needs to use 32 decoding attempts. Meanwhile, mBM-2(RDE,5) that uses 32 decoding attempts performs 
as good as 112-IEED where 112 decoding attempts are required. 
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Fig. 12. Performance of various decoding algorithms for the (255,191) RS code using 256-QAM over an AWGN channel. 

VIII. Conclusion 

A unified framework based on rate-distortion (RD) theory has been developed to analyze multiple 
decoding trials, with various algorithms, of RS codes in terms of performance and complexity. An 
important contribution of this paper is the connection that is made between the complexity and per- 
formance (in an asymptotic sense) of these multiple-decoding algorithms and the rate-distortion of an 
associated RD problem. Based on this analysis, we propose two solutions; the first is based on the RD 
function and the second on the RD exponent (RDE). The RDE analysis shows that this approach has 
several advantages. Firstly, the RDE approach achieves a near optimal performance-versus-complexity 
trade-off among algorithms that consider running a decoding scheme multiple times (see Remark [T}. 
Secondly, it helps estimate the error probability using exponentially tight bounds for N large enough. 
Further, we have shown that covering codes can also be combined with the RD approach to mitigate 
the suboptimality of random codes when the effective block-length is not large. As part of this analysis, 
we also present numerical and analytical computations of the RD and RDE functions for sequences 
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Fig. 13. Performance of various decoding algorithms for the (255,223) RS code using BPSK over an AWGN channel. 

of i.n.d. sources. Finally, the simulation results show that our proposed algorithms based on the RD 
and RDE approaches achieve a better performance-versus-complexity trade-off than previously proposed 
algorithms. One key result is that, for the (255, 239) RS code, multiple-decoding using the standard 
Berlekamp-Massey algorithm (mBM) is as good as multiple-decoding using more complex algebraic 
soft-decision algorithms (mASD). However, for the (458, 410) RS code, the RDE approach improves the 
performance of mASD algorithms beyond that of mBM decoding. 

Simulations results suggest an interesting conjecture that for moderate-rate RS codes, multiple ASD 
decoding attempts with small jjl is preferred while for low-rate RS codes, a single ASD decoding with 
large /i may be preferred. This conjecture remains open for future research. Our future work will also 
focus on extending this framework to analyze multiple decoding attempts for intersymbol interference 
channels. In this case, it will be appropriate for the decoder to consider multiple candidate error-events 
during decoding. Extending the RD and RDE approaches directly to this case is not straightforward since 
computing the RD and RDE functions for Markov sources in the large distortion regime is still an open 
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problem. Another interesting extension is to use clever techniques to reuse the computations from one 
stage of errors-and-erasures decoding to the next in order to lower the complexity per decoding trial (e.g., 
H). 



Appendix A 
Proof of Corollary [2] 



Proof: Using the formula in 11261 p. 27], we have 



N 



^max = y^miny^ffjj(5jfc. 
For mBM-£ with distortion matrix in (4i, we have 'Ylj=oPij^jk = Ylj^k "^Phj ~ ^(-'- ~Pi,k) for A; > 1 



and ELo^i.i^io = Yfj=oPi,j = 1- Therefore, 



N 



Z)max(mBM-£) = V min {1,2(1 -p,,fc)} 

^ — ' k=l,...£ 
i=l 

N 

= J^min{l,2(l-pi,i)} 



i=l 



since pi^i = ma.Xk>i{pi,k} ■ 



Similarly, for mASD- fj, with distortion matrix A„ in (24i, we have 



^Pi,j^jk = Pi,OPk,t, + ^Pi,j ( Pk„ 
j=0 j=l V 



2m 



'j,k 



fJ- 



EITT'j.k 
—rPid 
i=i ^ 



for A; = 1, . . . , T. Since multiplicity type 1 is always defined to be (/i, 0, . . . , 0), we have pi^^ = 2 and 
consequently. 



3=0 



Therefore, we obtain 



N 



-Dmax(mASD-/i) = ^ jxiin I 2(1 - Pi,i),Pk,f, - ^ -^Pi,j 



'fc=2,...,T 



rrij^k _ 



If mASD-/^i uses multiplicity type (0, 0, ... 0) which is, for example, labeled as type T then we have 

e 

—-PiJ = PT,^l = 1- 
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Consequently, 



N ' 



i:>max(mASD-//) = ^ _mm 



N 



1, 2(1 - Pi,i),pk,t, - 2_^ — -Pi,j 



.=1 ^ 



< J^mm{l,2(l-pi,i)} 



Z?^ax(mBM-£) 



and this completes the proof. 



Appendix B 
Proof of Lemma[6] 

Proof: With the notation p = 1 — p, according to ll26l p. 27] we have 
Dmin =p mill (5ofc+p mill (^ifc = l-p 

k k 

-Dmax = mm{p5ok + pSik) = min{l, 2(1 - p)}. 

k 

The function i?(-D) is not defined for D < Dmin and R{D) = for D > Dmax- For the case -Dmin < 
D < -Dmax> the rate-distortion function R{D) is given by solving the following convex optimization 
problem 

miriw I{X;X) 

subject to Wk\j = Pr(X = k\X = j) > \/j, k G {0, 1} 

^«o|o + ^i|o = 1 
«^o|i + wi|i = 1 
pwo\o+pwo\i + 2pwiiQ = ^ 
where the mutual information 

I{X;X) = p'^Wkiolog +p'^Wk\ilog 

k ^^ fc ^^ 

and the test-channel input probability-distribution 

Qk = Pr(X = k) = pWk\Q + pwk\i. 
We then form the Lagrangian 

J{W) = I{X;X) + ^7j(wo|i + wi\j - 1) + 7{pwo\o + pwQ\i + 2pwi\Q - Z?) - ^ XjkWk\j 
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and the Karush-Kuhn-Tucker (KKT) conditions becomq^ 



dJ 







Vj,A;G{0,l} 



wo\j + wi\j -1 = V j e {0, 1} 
Wk\j,\jk>0 Vj,/cG{0, 1} 

^\jkWk\j = Vj,A;G{0, 1} 

By ll26l Lemma 1, p. 32], we only need to consider the following cases. 

• Case 1: Wq\q = iuqii = 0. In this case, we further have Wi\q = Wi\i = 1. This leads to i? = and 
D = 2{1 — p) > Dmax which is a contradiction as we only consider D G [Dmin, ^max)- 

• Case 2: wi\q = Wi\i = 0. In this case, we have wq^q = wq\i = 1. This leads to i? = and 
D = I > Djnax which is also a contradiction. 

• Case 3: Wky > Vj, k £ {0, 1}. In this case, we know Xjk = and then, from ^^^ = 0, we obtain 



dw^ij 



Wk\0 



Qk 

Wk]! 



+ Sok7) + lo = VA;e{0,l}, 



p(log 

p(log ^^^ + (^1^7) + 71 = VA;G{0,1}. 



qk 



Equivalently, we have 



Wk\o = Qk^' 



-<5ofc7r 



-<Sifc7 



VfcG {0,1}, 



Wkii = qk2-'"'^2— VA;G{0,1}. 

Letting a = 2^^ and noticing that wq\j + t(;i|j = 1 Vj G {0, 1}, we get 

go ^.. gpQ 

qoa + qi ' 



w^o|o- 


qo +qia 


^0|l 




qia 




w^ilo 


rtr^ _L /-/-. r\i 


^i|i 



go" + q\ 



Putting this into the constraints 



pwoio +pw^o|i + 2pwi|o = -D 

go =PW^O|0+P^«0|l 
gi =P^^1|0+PW^1|1 



Here we use some abuse of notation and still write the optimizing values in their old forms without a * notation. 



November 9, 2010 



DRAFT 



51 



we have a set of 3 equations involving 3 variables a,qo,qi. Solving this gives us 

D+p-1 



90 



" 2-{D+p)' 
2{l-p)-D 
3-2(D + p)' 
1-D 

'^'~ 3-2{D + p)- 

Therefore, we can obtain the optimizing w^^j and have 

R = H{p)-H{ ^ ^ 



= H{p)-H{D + p-l). 
Hence, in all cases R = [H{p) — H{D + p — 1)]^ and we conclude the proof. ■ 

Appendix C 
Proof of Theorem[3] 

Proof: The objective here is to compute the RD function for a discrete source sequence x^ of i.n.d. 
source components Xj. First, with the notations pi^j = Pr(Xi = j) and q-i^j = Pr(Xj = j) for j G {0, 1) 
and i G {1, 2, . . . A^}, Lemma |6] gives us the rate-distortion components 

R^{Di) = [H{pi) - H{Di+p,,i - 1)]+ 

along with the test-channel input-probability distributions 

2(l-p,,i)-A . 1-A 

%o = 1, — wr^ — r^TT and qi^i - 



3 - 2(p,,i + A) ' 3 - 2(pi,i + A) 

for each index i of the codeword. The overall rate-distortion function is given by 

R{D) = min Ri{Di) 

N 
= min V [H{pi) - H{Di + pi,i - 1)] + 

which is a convex optimization problem. 

Using Lagrange multipliers, we form the functional 

J{D) = Y^ {H{p,,i) - H{Di + pi,i - 1)) + ^ K] A - Z? 

4=1 \i=l / 



and compute the derivatives 



dJ Di+pi^i - 1 

dDi 2 - A - Pi,i 



November 9, 2010 DRAFT 



52 



The Kuhn-Tucker condition (see the restated version in ||29l . page 86) then tells us that there is 7 such 
that 



dJ 



= if Ri{Di) > 
< if Ri{Di) = 



which is equivalent to 



D,+pi,i - 1 j = 2-^ if ^(p^,l) - H{Di+p,^i - 1) > 

<2~^ ifH{p,^i)-H{Di+p,^i-l)<0 



2 - A - Pi,i 



A_ 2^'' 



With the notations Di = Di+ pj 1 — 1 and A = ji^^r , it is equivalent to 

: A if A < min{pi,i, 1 -pj,i} 



D, 



< A otherwise 



Finally, it becomes 



Di 



A ifA < min{pi,i,l -pi^i} 

in.in{pi^i,l — Pii} otherwise 



where 



N 



N 



i=l 



and we conclude the proof. 



i=l 

N 

= D + Y,Pi,i-N 

i=l 

m 

Appendix D 
Analysis of RDE Computation 

Consider a binary single source X with Fv{X = I) = p and Pr{X = 0) = 1 — p = p. According to 
II2TI . for any admissible {R, D) pair we can find two parameters s > and t < so that F{R, D) can 
be parametrically evaluated as 

F{R, D)=sR- stD + max (- log f{qi)) 

11 



sR — stD — log min f{qi 



<?i 



where 



fiqi)=p[Y.^>^'^''"'] +P E^'^2*^" 
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and R, D are given in terms of optimizing q*. 

For the distortion measure in Q and with qq = 1 — qi, we have 

/(<7i)=p((l-gi)2* + gi22*)-^+p((l-gi)2* + gi)-^ 

which is a convex function in qi. Taking the derivative q^ = gives us 

,^ = 1±^ (^_ _ ^^ Pf" ^ ] A ^. 
1-2* \^ 1 + 2* 2^p^+p^J 

In order to minimize f{qi) over qi G [0, 1], we consider three following cases where the optimal q^ 

is either on the boundary or at a point with zero gradient. 

• Case 1: < p < j^ then (3 < 0. Since / convex, it is non-decreasing in the interval [/3,oo) and 
therefore in the interval [0, 1]. Thus, the optimal q* = and we can also compute 

D = l; R = 0; F = = Dkl{p\\p)- 

• Case 2: 1 > p > ^,2^^=+^) ^^^^ /? ^ 1- Since / convex, it is non-increasing in the interval (— oo, /3] 
and therefore in the interval [0, 1]. Thus, the optimal g* = 1 and we get 

^ = p2^-' ^ = °= F = Dkl{u\\p) 
where in this case u = 1 — ^. We can further see that D € [2(1 — p),l] and n G [1 — D,p]. 

• Case 3: j^ < p < ^^r,tl2,+i) then /3 G (0, 1). In this case, the optimal q^ = /3. We can find 
wt, ■ = ^^ — TztF^ according to 1211 and then obtain 

2* 
D = 7 + 1-U, 

1 + 2* 

R = H{u)-H{u + D-l), 
F = Dkl{u\\p) 
where 

St 1 

2=+ip=+i 
u 



2s+ip=+i +p=+i 
With this notation of u, we can express 

l-D , ^ 2{l -u)-D 

gT = -, ^ and g^ = ^ —^ r. 

^^ ?,-2{u + D) ^° 3-2(u + Z)) 

We can see that D G (1 — p, 1). It can also be verified that, in this case, by varying s and t, u spans 

(1 - L>, 1 - f ) and R spans (0, H{1 - D)). 
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